Lower WERs do not guarantee better transcriptions
نویسندگان
چکیده
The goal of this paper is to investigate the effect of various properties of the CSR on automatic transcription. To this end, we used various versions of a continuous speech recognizer (CSR) to make automatic transcriptions. Our results show that changing certain properties of the CSR affects the resulting automatic transcriptions. The best results were obtained when ‘short’ hidden Markov models (HMMs), and contextindependent HMMs were used. Furthermore, we found that minimizing the amount of contamination in the HMMs improves the quality of the automatic transcriptions. Another important result is that there does not appear to be a straightforward relation between word error rate (WER) and the transcription quality. In other words: A CSR with a lower WER does not always guarantee better transcriptions.
منابع مشابه
On automatic phonetic transcription quality: lower word error rates do not guarantee better transcriptions
The first goal of this study was to investigate the effect of changing several properties of a continuous speech recognizer (CSR) on the automatic phonetic transcriptions generated by the same CSR. Our results show that the quality of the automatic transcriptions can be improved by using short hidden Markov models (HMMs) and by reducing the amount of contamination in the HMMs. The amount of con...
متن کاملExtracting Semantically-coherent Keyphrases from Speech
Previous methods for extracting keyphrases from spoken audio have used text-based summarisation techniques on automatic speech transcription. The method of Désilets et.al. (2000) was found to produce accurate keyphrases for transcriptions with Word Error Rates (WER) of the order of 25%, but performance was less than ideal for transcripts with WERs of the order of 60%. With such transcripts, a l...
متن کاملValidation of phonetic transcriptions in the context of automatic speech recognition
Some of the speech databases and large spoken language corpora that have been collected during the last fifteen years have been (at least partly) annotated with a broad phonetic transcription. Such phonetic transcriptions are often validated in terms of their resemblance to a handcrafted reference transcription. However, there are at least two methodological issues questioning this validation m...
متن کاملAutomatic transcription of football commentaries in the MUMIS project
This paper describes experiments carried out to automatically transcribe football commentaries in Dutch, English and German for multimedia indexing. Our results show that the high levels of stadium noise in the material create a task that is extremely difficult for conventional ASR. The baseline WERs vary from 83% to 94% for the three languages investigated. Employing state-of-the-art noise rob...
متن کاملSelection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems
This paper compares schemes for the selection of multi-genre broadcast data and corresponding transcriptions for speech recognition model training. Selections of the same amount of data (700 hours) from lightly supervised alignments based on the same original subtitle transcripts are compared. Data segments were selected according to a maximum phone matched error rate between the lightly superv...
متن کامل